| Dataset | A | B |
|---|---|---|
| mean_x | 54.2632732 | 54.2658818 |
| mean_y | 47.8322528 | 47.8314957 |
| sd_x | 16.7651420 | 16.7688527 |
| sd_y | 26.9354035 | 26.9386081 |
| cor_xy | -0.0644719 | -0.0686092 |
Data Visualisation Workshop
This session will include:
why you should visualise data;
some guidelines for making better charts;
examples of charts made in R;
a chance for you to make charts!
royal-statistical-society.github.io/datavisguide
Data visualisation has two main purposes:
Because summary statistics aren’t enough…
| Dataset | A | B |
|---|---|---|
| mean_x | 54.2632732 | 54.2658818 |
| mean_y | 47.8322528 | 47.8314957 |
| sd_x | 16.7651420 | 16.7688527 |
| sd_y | 26.9354035 | 26.9386081 |
| cor_xy | -0.0644719 | -0.0686092 |
Grab attention
Visualisations stand out. If a reader is short on time or uncertain about whether a document is of interest, an attention-grabbing visualisation may entice them to start reading.
Improve access to information
Textual descriptions can be lengthy and hard to read, and are frequently less precise than a visual depiction showing data points and axes.
Summarise content
Visual displays allow for summarising complex textual content, aiding the reader in memorising key points.
John Snow collected data on cholera deaths and created a visualisation where the number of deaths was represented by the height of a bar at the corresponding address in London.
This visualisation showed that the deaths clustered around Broad Street, which helped identify the cause of the cholera transmission, the Broad Street water pump.
Snow. 1854.
What is the purpose of the chart?
Source: giphy.com
Should the y axis start at 0?
They don’t always have to start at zero…
Order categories appropriately…
Order based on magnitude unless the category order has meaning…
Source: Georgia Department of Public Health
Why use colours in data visualisation?
Colours should serve a purpose, e.g. discerning groups of data
Colours can highlight or emphasise parts of your data.
Not always the most effective for, e.g. communicating differences between variables.
Different types of colour palettes…
… for different types of data.
Is this a good choice of colour?
Check for colourblind friendly plots with colorblindr::cvd_grid(g).
In base R via {RColorBrewer}: brewer.pal(4, "Dark2")
In {ggplot2}, scale_fill_brewer(), scale_fill_distiller(), scale_colour_brewer(), scale_fcolour_distiller().
Alternatives to theme_grey()!
or use {ggtext} to colour font in the subtitle…
Arial: Does it pass the 1Il test?
Times New Roman: Does it pass the 1Il test?
Courier New: Does it pass the 1Il test?
Font size: larger fonts are (usually) better
Font colour: ensure sufficient contrast
Font face: highlight text using bold font, avoid italics
Font family: choose a clear font with distinguishable features (pick something familiar)
There is no perfect font.
Charts should have a purpose
Actively design visualisations
Default settings aren’t always the best choices
Fundamentals of data visualization: clauswilke.com/dataviz
R Graphics Cookbook: r-graphics.org
#TidyTuesday: github.com/rfordatascience/tidytuesday
DataWrapper Blog: blog.datawrapper.de
This data set is from a study published in 1757 in A Treatise on the Scurvy in Three Parts, by James Lind.
“Pages 149-153 are a rare gem among what can be generously described as 400+ pages of evidence-free blathering, and these 4 pages may represent the first report of a controlled clinical trial.”
| study_id | treatment | dosing_regimen_for_scurvy | gum_rot_d6 | skin_sores_d6 | weakness_of_the_knees_d6 | lassitude_d6 | fit_for_duty_d6 |
|---|---|---|---|---|---|---|---|
| 001 | cider | 1 quart per day | 2_moderate | 2_moderate | 2_moderate | 2_moderate | 0_no |
| 002 | cider | 1 quart per day | 2_moderate | 1_mild | 2_moderate | 3_severe | 0_no |
| 003 | dilute_sulfuric_acid | 25 drops of elixir of vitriol, three times a day | 1_mild | 3_severe | 3_severe | 3_severe | 0_no |
| 004 | dilute_sulfuric_acid | 25 drops of elixir of vitriol, three times a day | 2_moderate | 3_severe | 3_severe | 3_severe | 0_no |
| 005 | vinegar | two spoonfuls, three times daily | 3_severe | 3_severe | 3_severe | 3_severe | 0_no |
| 006 | vinegar | two spoonfuls, three times daily | 3_severe | 3_severe | 3_severe | 3_severe | 0_no |
| 007 | sea_water | half pint daily | 3_severe | 3_severe | 3_severe | 3_severe | 0_no |
| 008 | sea_water | half pint daily | 3_severe | 3_severe | 3_severe | 3_severe | 0_no |
| 009 | citrus | two lemons and an orange daily | 1_mild | 1_mild | 0_none | 1_mild | 0_no |
| 010 | citrus | two lemons and an orange daily | 0_none | 0_none | 0_none | 0_none | 1_yes |
| 011 | purgative_mixture | a nutmeg-sized paste of garlic, mustard seed, horseradish, balsam of Peru, and gum myrrh three times a day | 3_severe | 3_severe | 3_severe | 3_severe | 0_no |
| 012 | purgative_mixture | a nutmeg-sized paste of garlic, mustard seed, horseradish, balsam of Peru, and gum myrrh three times a day | 3_severe | 3_severe | 3_severe | 3_severe | 0_no |
Edit the code to improve this chart:
15:00
Bring your own chart! (Or make a new one!)
Data sources:
library(medicaldata)
#TidyTuesday: github.com/rfordatascience/tidytuesday
Source: giphy.com
GitHub: github.com/nrennie/chicas-data-viz-workshop
Slides: nrennie.github.io/chicas-data-viz-workshop
Questions?